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On the occurrence of boundary solutions in two-way 

incomplete tables 


S. GHOSH AND P. VELLAISAMY 

Abstract. The analysis of incomplete contingency tables is an important problem, which is 
also of practical interest. In this paper, we consider boundary solutions under nonignorable 
nonresponse models in two-way incomplete tables with data on both variables missing. We 
establish a result similar to Park et al. (2014) on sufficient conditions for the occurrence of 
boundary solutions. We also provide a new result, which connects the forms of boundary 
solutions under various parameterizations of the missing data models. This result helps us to 
give the exact form of boundary solutions in the above tables, which improves a claim made 
in Baker et al. (1992) and avoids computational burden. A counterexample is provided to 
show that the sufficient conditions for the occurrence of boundary solutions are not necessary, 
thereby disproving a conjecture of Kim and Park (2014). Finally, we establish new necessary 
conditions for the occurrence of boundary solutions under nonignorable nonresponse models 
in square two-way incomplete tables, and show that they are not sufficient. These conditions 
are simple and easy to check as they depend only on the observed cell counts. They are useful 
and important for model selection also. Some real life data sets are analyzed for illustrating 
the results. 


1. Introduction 

Contingency tables with fully observed counts and partially classified margins (nonre¬ 
sponses) are called incomplete tables. The following three types of missing data mechanisms 
have been proposed in the literature (Little and Rubin (2002)): missing completely at ran¬ 
dom (MCAR), missing at random (MAR) and not missing at random (NMAR). The missing 
mechanism is said to be (a) MCAR when missingness is independent of both observed and 
unobserved data, (b) MAR when missingness depends only on observed data, and (c) NMAR 
if missingness depends only on unobserved data. Nonresponses are called ignorable when the 
missing data mechanism is MAR or MCAR, and the parameters governing the missing data 
mechanism are distinct from those to be estimated. They are nonignorable when the missing 
data mechanism is NMAR. 

Log-linear models have generally been used to study missing data mechanisms in incomplete 
tables (see Park et al. (2014) and references therein). However, under nonignorable models, a 
boundary solution occurs when the cell probabilities of non-respondents are estimated to be 
zeros for certain levels of the missing variables. Note that the problem of boundary solutions 
is an important one as it has serious consequences for statistical inference. For example, 
the observed counts cannot be reproduced by a perfect fit model (a model for which the 
estimated expected counts are equal to the observed counts) if boundary solutions occur. This 
implies that the fit is inadequate and the parameter estimates are imprecise. The maximum 
likelihood estimators (MLE’s) of the parameters lie on the boundary of the parameter space. 
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The log likelihood function is flat due to which convergence of the EM algorithm to the 
boundary MLE’s requires a lot of iterations. Also, the eigenvalues of the covariance matrix 
are inappropriate (either around zero or negative), which implies some parameter estimates 
have large estimated standard errors and wide confidence intervals. Hence, it is useful to study 
various forms of boundary solutions and explore conditions for their occurrence in incomplete 
tables. 

This problem was first considered by Baker and Laird (1988) who proposed a sufficient 
condition for the occurrence of boundary solutions in a 2 x 2 x 2 incomplete table. Baker et 
al. (1992) studied the problem for an/xJx2x2 incomplete table, which has non-monotone 
missing value patterns. For an I x J x 2 incomplete table with simple monotone missing value 
patterns, Smith et al. (1999) and Clarke (2002) described the problem geometrically, while 
Clarke and Smith (2005) discussed properties of MLE’s in case of boundary solutions. Park et 
al. (2014) proposed sufficient conditions for the occurrence of boundary solutions under various 
NMAR models in an / x / x 2 x 2 incomplete table. Recently, Ghosh and Vellaisamy (2016) 
provided forms of boundary solutions in arbitrary three-way and n-dimensional incomplete 
tables with one or more variables missing, and also established sufficient conditions for their 
occurrence under various NMAR models. 

The purpose of this paper is to provide a comprehensive treatment of the problem of bound¬ 
ary solutions in two-way incomplete tables with both variables missing. To this effect, we first 
define boundary solutions that might occur under various NMAR models in such tables. We 
prove a new result that connects forms of boundary solutions under missing data models with 
various parameters. This helps us to obtain the exact boundary solutions in those models 
directly and hence avoid unnecessary calculations given in Baker et al. (1992). We provide 
sufficient conditions on the occurrence of boundary solutions in the above tables, which are 
similar to Park et al. (2014) but proved using direct arguments in a straightforward way. A 
counterexample is given to disprove a conjecture of Kim and Park (2014) on the necessity 
of the sufficient conditions. Finally, we establish new necessary conditions, using only the 
observed cell counts, for the occurrence of boundary solutions in the above tables. These con¬ 
ditions prove very helpful for fitting appropriate models to the incomplete data. An example 
is provided to show that these conditions are not sufficient. 

The rest of the paper is organized as follows. In Section 2, we introduce some notations and 
consider various identifiable NMAR log-linear models (Models [M1]-[M5]) for an I x J x 2 x 2 
incomplete table. The problem of boundary solutions, along with their forms under the above 
models, is discussed in Section 3. We formally define boundary solutions for an / x J x 2 x 2 
incomplete table by extending the definition of Baker and Laird (1988). A new result is 
provided, which gives the relationship among forms of boundary solutions according to various 
parameterizations for the missing data models. In Section 4, we illustrate this result using 
some data analysis examples from Baker et al. (1992), thereby improving a claim made 
by them on the forms of boundary solutions in / x J x 2 x 2 tables as well as eliminating 
computations. 

In Section 5, we prove a result on sufficient conditions for the occurrence of boundary 
solutions under Models [M1]-[M5], based on a similar approach but using direct arguments 
instead of contrapositive ones used in Park et al. (2014). A real life data analysis is carried out 
using our result. We verify the occurrence of boundary solutions directly using the definitions 
from Baker et al. (1992), and not the EM algorithm as in Park et al. (2014). An example is 
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provided to show that the sufficient conditions for the occurrence of boundary solutions are 
not necessary, which refutes a conjecture due to Kim and Park (2014). 

Finally, we propose necessary conditions for the occurrence of boundary solutions under 
Models [M1]-[M5] in square two-way incomplete tables, and later show that they are not 
sufficient through an example. Such conditions do not exist in the literature. Note that these 
conditions help us to identify the non-occurrence of boundary solutions, which is very useful 
for model selection. Also, these conditions involve only the observed cell counts and their sums 
in the tables, and hence can be easily verified. Section 6 provides some concluding remarks. 


2. NMAR LOG-LINEAR MODELS 

Suppose Y 1 and Y 2 are two categorical variables having / and J levels respectively. For 
i = 1,2, let Ri denote the missing indicator for Y t so that Ri = 1 or 2 if Y t is observed or 
unobserved. Then we have an/xjx2x2 incomplete table, corresponding to Y \, Y 2 , R± and 
R 2 , with cell counts y = {yijki} where 1 < i < /, 1 < j < J and 1 < k,l < 2. The vector of 
observed counts is y obs = ({Viju}, {yi+ 12 }, {y+j2i}, ^++ 22 ), where {yyii} are the fully observed 
counts and { 2 / 4 + 12 }, {y + j 2 i},y ++22 are the supplementary margins, all of which are assumed to 
be positive. Note that ‘+’ denotes summation over levels of the corresponding variable. Let 
7 r = {TTijki} be the vector of cell probabilities, y = {y^ki} be the vector of expected counts 
and N = JA . k t y^i the total number of cell counts. For I = J = 2, we have the 2 x 2 x 2 x 2 
incomplete table (Table 1). 


Table 1. 2 x 2 x 2 x 2 Incomplete Table. 




R 2 — 1 

R 2 = 2 



Y 2 = l Y 2 = 2 

Y 2 missing 

Ri = 1 

Y\ — l 

Y\ —2 

y 1111 1/1211 

2/2111 1/2211 

1/1+12 

1/2+12 

Ri = 2 

Y] missing 

y+ i 2 i y+ 22 i 

y++ 22 


We consider Poisson sampling for convenience, that is, Y^i ~ P (yijki)- The likelihood function 
of /i is 




( 2 . 1 ) 


T(/b yobs) 




Vijki rr ..y ^ 11 rr n Vi+ 12 Y r /A +i21 /A++ 22 

1 U,j /Aj i 1 1L ^+12 i 1 j A i +j 2 i ^++22 
1 yijki' 


so that the log-likelihood function of y is 


( 2 . 2 ) 


l (h; yobs) — ^ yiju log iMjn + ^ yi +12 log yi+u + ^ y+j 2 i log y+j 2 i 
i,j i j 

+y++ 22 log y ++22 — yijH + A, 

i,j,k,l 


where A is independent of yijki s. For an / x J x 2 x 2 incomplete table, Baker et al. (1992) 
proposed the following log-linear model (with no three-way or four-way interactions): 


log y^ki — A + Ay-^i) + A y 2 (j) + A ^(k) + Xr 2 (1) + A Yl Y 2 (hj) 

+Xy 1 R 1 (i, k ) + Xy 2 R 1 (j, k ) + A y 1 r 2 (i, l ) + A y 2 R 2 (j, l) + A R 1 R 2 (k, /), 
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(2.3) 










where the sum over any argument of a log-linear parameter is zero, for example, JA Ay 1 y 2 (i, j ) = 
^Yir 2 {i ,j) = 0. To study the various missing mechanisms of Yi and Y 2 , Baker et al. (1992) 
introduced the following notations: 

_ P(Ri — 2,R 2 — 1 |Y = i,Y 2 = j ) _ ^21 _ A% 2 i 

a * J P(R 1 = 1, R 2 = l\Yi = i,Y 2 = j) niju Hiju ’ 

t _ P(Rl = 1, R 2 = 2 |Y 1 = i,Y 2 = j ) _ 7Tjj 1 2 _ /ip -12 

J P(i?i = 1, R 2 = 11T 1 = i,Y 2 = j ) TTpii A*ijii 

_ P(iU = 1, P 2 = 1| ^ = i, Y 2 = j)P(R 1 = 2, P 2 = 2|Ti = z, Y 2 = j) 
mijU nijlU 9 P(Ri = 1 ,R 2 = 2|Ti = *, Y 2 = j)P(Ri = 2 ,R 2 = lin = i,Y 2 = j )' 

Note that m^n = and (7 denotes the odds ratio between the missing indicators of Y\ 

and V 2 . Also, jiij 2 1 = 11 (ijj , fijj \ 2 = trij 3 11 bj 3 and // 7 j 22 = ///./j 11 cijjbjjf]. Note that Ojj is the 
conditional odds of Yi being missing given Y 2 is observed, while b^ is the conditional odds of 
Y 2 being missing given Y\ is observed. Here, a l3 and b tJ describe the missing mechanisms of Y\ 
and Y 2 , respectively. Under (Q, a tj = exp[-2{A Rl (l)+Ay li? 1 (i, 1) + A y 2 RiU, l)+A Rl ij 2 (l, 1)}], 
b i: j = exp[-2{A R 2 (l) + Ay li? 2 (i, 1) + A Y 2 R 2 (j, 1) + A i?li? 2 (l, 1)}] and g = exp[4A Rlfi 2 (l, 1)]. Denote 
a ij (bij) by ay (/3i.) or a.j (P.j) or a.. (ft.) if it depends only on i or j or none, respectively. 
Then we have the following definition. 

Definition 2.1. The missing mechanism of Y\ under (j2.3j) is NMAR if a rj = ay, MAR if 
= a.j and MCAR if a VJ = a... Similarly, the missing mechanism of Y 2 is NMAR if b t] = P.j. 
MAR if b^ = p L and MCAR if b^ = P... 

Using Dehnition 12.11 and the above notations, there are nine possible identifiable models 
(see pp. 647-648 of Baker et al. (1992)) based on different missing mechanisms for Y\ and 
Y 2 . The equivalent log-linear models can be obtained as submodels of (12.31) . As an example, 
consider the model (ay, Pi.), for which the missing mechanism is NMAR for Y 1 and MAR for 
Y 2 . Using the expressions of a l3 and b l3 above, the corresponding log-linear model is obtained 
from (12.3p by substituting X Y2Rl (j, k) = X y 2 H 2 (j, 0 = 0. The following are the five models 
when the missing mechanism is NMAR for Y\ or Y 2 . 

1. Model Ml (NMAR for Y,, MCAR for Y 2 ): 

log Hijkl = A + Ayj(f) + Ay 2 (j) + X Rl (k) + Xr 2 (1) + A Yl Y 2 (hj) + A YLRj.ih k) + X Rl R 2 (k, l) 

2. Model M2 (NMAR for Y 2 , MCAR for Yi): 

log Hijki = A + Ay x (z) + A y 2 (j) + A Rl (k) + Xjt 2 (l) + A Yl v 2 (i,j) + X Y2R2 (j,l) + X RlR2 (k,l) 

3. Model M3 (NMAR for Y u MAR for Y 2 ): 

log Hijki = ^ + ^Y 1 (i) + ^Y 2 (j) + X Rl (k) + X R2 (l) + X Yl Y 2 (i, j)+X YlRl (i, k)+X YlR2 (i, l)+X RlR2 (k , l) 

4. Model M4 (NMAR for Y 2 , MAR for Y\): 

log Hijkl = ^ + M\(i) + Xy 2 (j) + X Rl (k) + Xr 2 (1) + X Yl y 2 (i, j) + Xy 2Rl (j, k) + Ay 2 j R 2 (j, l) + X Rl R 2 (k , l) 

5. Model M5 (NMAR for both Yi and Y 2 ): 

log Hijki = X+X Yl (i)+X Y2 (j)+X Rl (k)+X R2 (l)+X YlY2 (i,j)+X YlRl (i, k)+X Y2 R 2 (j , l)+X RlR2 (k, l) 

4 
















Note that for Models [M1]-[M5], there is an association term between a variable and its 
missing indicator if the missing mechanism is NMAR for that variable (for example, the term 
Ay,ft, (i, k) in Model [Ml]), between a variable and the other missing indicator if the missing 
mechanism is MAR for that variable (for example, the term Ay 2 ^, (j, k) in Model [M4]) and 
none if the missing mechanism is MCAR for a variable (for example, Ay 1 R 1 (i, k) and Xy 2 Ri (.7, k) 
are absent in Model [M2]). 

3. Boundary solutions in NMAR models 

In this section, we consider boundary solutions under non-ignorable nonresponse (NMAR) 
models for an / x J x 2 x 2 incomplete table. We first define boundary solutions under the 
above models and then present a result relating the forms of boundary solutions in terms of 
various parameterizations of the models. 

For an incomplete table, boundary solutions in NMAR models occur when the MLE’s of 
nonresponse cell probabilities are all zeros for certain levels of the missing variables. For 
an / x J x 2 incomplete table, where data on only Y 2 is missing, Baker and Laird (1988) 
defined boundary solutions in the NMAR model for Y 2 as k l]2 = 0 for at least one pair (i,j). 
For the same model, Clarke and Smith (2005) showed that boundary solutions are given by 
7 t_|_j 2 = 0 for at least one and at most (J — 1) values of Y 2 . Baker and Laird (1988) defined a 
nonresponse boundary solution under NMAR models in general to be a stationary point that 
lies on a boundary of the space of parameters modeling the nonignorable nonresponse. Using 
this, we may extend their definition toan/xJx2x2 table as follows. 

Definition 3.1. Consider an/xJx2x2 incomplete table, and let 1 < i < I, 1 < j < J 
and k,l = 1,2. Then we have the following. 

1. A nonresponse boundary solution under the NMAR models for Y x only, that is, Models 
[Ml] and [M3] is an MLE given by tt ij2 i = 0 for at least one combination (i,j, /). 

2. A nonresponse boundary solution under the NMAR models for Y 2 only, that is, Models 
[M2] and [M4] is an MLE given by 7f tJ k2 = 0 for at least one combination (i, j, k ). 

3. A nonresponse boundary solution under the NMAR model for both Y\ and Y 2 , that is, 
Model [M5] is an MLE given by 7r lj2 i = 0 for at least one combination (i, j, l ) or 7^2 = 0 for 
at least one combination ( i,j,k ). 

Note that in the literature, boundary solutions have usually been defined in terms of cell 
probabilities because the cell probabilities are in some sense natural to the model for the 
incomplete table, whereas the loglinear parameters are not. The next proposition explores 
the relationships among boundary solutions under Models [M1]-[M5] in terms of MLE’s of 
nonresponse cell probabilities, some specific log-linear parameters and cu. or (3.j for two-way 
incomplete tables with both variables missing. 

Proposition 3.1. For an / x J x 2 x 2 incomplete table, we have the following. 

1. For Models [Ml] and [M3], if boundary solutions occur, then they are given by Xnrh (L 2) - 

—00 n i+2+ = 0 dj. = 0 for at least one and at most (/ — 1) values of Y\. 

2. For Models [M2] and [M4], if boundary solutions occur, then they are given by Ay 2 _R 2 (j, 2) = 

—00 7t_|_j + 2 = 0 <S=> f3,j = 0 for at least one and at most (J — 1) values of Y 2 - 

3. For Model [M5], if boundary solutions occur, then they are given by Ay^j (i, 2) 
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= —00 or 


A Y 2 R 2 U 1 2) = —00 TT i+ 2 + = 0 or vr+j +2 = 0 4^ cR = 0 for at least one and at most (/ — 1) 
values of Y\ or /R = 0 for at least one and at most (J — 1) values of Y 2 . 

Proof. From Definition 13.11 it follows that if boundary solutions occur under the Models [Ml]- 
[M5], then the MLE’s of the cell probabilities except some of the nonresponse ones are all 
non-zero. On substituting k = l = 1 (for response cell probabilities) in the above models 
and using the parameter constraints, we can then deduce that the MLE’s of the constant, the 
main effects and the association terms between Y)’s, between Rf s, and between Yi and Rj for 
i 7 ^ j are all finite. This is because non-zero terms (response cell probabilities) on the LHS of 
the log-linear models imply that the log-linear parameters on the RHS are finite. 

Consider part 1 first. For the Models [Ml] and [M3], the log-linear parameters modelling the 
non-ignorable nonresponse (NMAR) mechanism of Y\ are \R 1 (k) and Ay^ (i, k). If boundary 
solutions occur, then they are of the form 7 R 2 / = 0 (see point 1 of Definition 13.11) . which 
implies Ay ft, (?', 2) = — 00 for at least one i since the other parameters are hnite as mentioned 
above. Then under Model [Ml], we have 

7fj+2+ = Rjj2 1 

3,1 

= N ^exp{A + Ay,(i) + A y 2 (j) + Aft,(2) + A r 2 {1) + Ay,ft,(i,2) + A y,y 2 (f, j) 

3,1 

+Xr 1 r 2 (2, 0 } 

= 0 

for at least one i. Conversely, we have 
7Tj_|_2+ = 0 (for at least one i ) 

=* N E exp{A + Ay,(i) + Ay 2 (j) + Aft, (2) + A r 2 (1) + Ay,ft,(f, 2) + Ay,y 2 (i, j) + Xr 1 r 2 (2,1)} — 0 
3,1 

=>■ Ay,ft,(i, 2) = —00 for at least one i, 

so that Ay ljRl (i,2) = —00 44 n i+2 + = 0 for at least one i under Model [Ml], The same can 
be shown for Model [M3]. Under Models [Ml] and [M3], a^ = exp[2{Aft, (2) + Ay,ft, (R2) + 
A_r 1 /j 2 (2, 1)}]. Since depends only on i, we have a t j = a,.. It is clear that cR = 0 44 
•^yiHi(b2) = — 00 . Also, note that by definition of a^, if cR = 0 for all 1 < i < I, then 
y+j 2 i = 0 for all 1 < j < J, which is a contradiction since supplementary margins are 
assumed to be positive. Hence, under Models [Ml] and [M3], boundary solutions are given by 
Ay,ft,(f, 2) = —00 44 Tti+ 2 + = 0 44 cR = 0 for at least one and at most (/ — 1) values of Y\. 

Consider part 2 now. Under Models [M2] and [M4], the log-linear parameters modelling the 
NMAR nechanism of Y 2 are A_r 2 (/) and Ay 2 ft 2 (j, 0- Also, b^ = exp[2{Aft 2 (2) + \y 2 R 2 (j,2) + 
Xr 1 r 2 (^-, 2)}]. Since b^ depends only on j, we have bij = (3.j. Then it can be shown similarly 
as above that boundary solutions in this case are given by A y 2 R 2 ( j , 2) = —00 44 ' k + j +2 = 0 <^> 

/ 3,j = 0 for at least one and at most (J — 1) values of Y 2 . 
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Finally, consider part 3. Under Model [M5], the log-linear parameters modelling the NMAR 
nechanisms of Y l and Y 2 are A ^(k), X r 2 (1), Xy 1 R 1 {i,k) and \y 2 n 2 (j, /). The proof for the 
form of boundary solutions under Model [M5] follows on similar lines as for Models [M1]-[M4] 
shown above. □ 

From the proof of Proposition 13.11 note that the one-to-one relation between the cell prob¬ 
abilities and the log-linear parameters cannot be used to derive the connection between the 
different forms of boundary solutions. This is because it is not obvious which specific log- 
linear parameters have infinite MLE’s just by noting the zero MLE’s of the nonresponse cell 
probabilities when boundary solutions occur. 


4. Some examples of boundary solutions in NMAR models 


In this section, we reanalyze some examples in Baker et al. (1992), illustrating the result 
in Section 3. We use Proposition 13.11 to investigate a claim made by Baker et al. (1992) 
regarding forms and occurrence of boundary solutions in an / x J x 2 x 2 incomplete table. 
This improvement is useful as it avoids computation and provides the exact boundary solutions 
under a NMAR model by simply noting the level (s) of the variable (s) for which the MLE’s 
of the parameters are negative or infinite. 

First, we present the correct expression of the likelihood ratio statistic for missing data 
models in such a table. Consider testing the goodness of fit of a null model (here one of 
the Models [M1]-[M5]) against the alternative model (perfect fit model). Let {fiijki} and 
{fiijki} denote the MLE’s of the expected counts under a null model and a perfect fit model 
respectively. Also, let L 0 and L\ denote the log-likelihoods for the null and the alternative 
models, respectively. Then the likelihood ratio statistic is given by 


G 2 = 


-2(Lo — Li 

-2 


/h +12 


Y yij ii ln (+ Y yi + 12 ln > 

^ ' hull/ Vhi+12 


L 




+y ++22 ki (^ ++ ) — h++++ + h++++ 

Vh++22 / 


= -2 


. 


Ui +12 In 


V, m-rjwh 




Ui+12 


Y y +i 21 ln 


Xu Wjjll&ij 
y+j 2 i 


Ui) -E ihjjn(l + a,ij + bij + dijbijg) + N 


ho 


Note that the last two terms of (14. lj) are missing in the expression of G 2 in Baker et al. 
(1992) (see p. 646). Observe that in general, J2ij + hj + ® ijhjd) ^ N, unless the 

hypothetical (null) model is a perfect fit model for example, in which case G 2 = 0. 

Using Definition 12.11 and the notations in Section 2, Models [M1]-[M5] can be represented 
as follows — Model [Ml]: (aj.,/3..), Model [M2]: Model [M3]: (ay,/%.), Model [M4]: 

and Model [M5]: (ai_,/3_j). Accordingly, the expression of G 2 in (14.ip for each of 
the above models may be obtained by making suitable substitutions and using the MLE’s in 
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Baker et al. (1992) (see pp. 647-648). For example, the MLE’s under the model (aa.,/3.) are 


777. jj 11 


yijuyi+i+y++n 


E - _ n _ y++ 12 ~ _ 

TYlijWQti 2/j21 5 P.. > 9 


Vi+liy++l+ 

Hence, from (14.11) . the likelihood ratio statistic is 


y++u 


y++ny ++22 

y++ny++ 2 i 


G 2 = -2 

E y ^ u ln 

( yi+i+y++n ^ 

+ E yi + n ln 

( yt+i+y++i2 A 



Kyi+ny++i+ J 

i 

\yi+i2y++i+ J 


Baker et al. (1992) mentioned that if any solution cR or /3,j to the systems of equations (15.31) 
and (15.41) is negative, then boundary solutions occur, that is, the MLE lies on the boundary 
of the parameter space. Closed-form boundary MLE’s under Models [M1]-[M5] may then be 
obtained (see p. 649 of Baker et al. (1992)) by setting certain parameter estimates (da. or 
/3_j) to 0 in the likelihood equations obtained from (12. 2 p for the models. They claimed that 
counterintuitively, the parameter estimate set to 0 need not be the estimate with a negative 
value as the solution to the above systems of equations. In particular, fora2x2x2x2 
incomplete table, they suggested examining both boundaries da. = 0 and d 2 . = 0; similarly 
(3. i = 0 and j3, 2 = 0 to determine the minimum value of G 2 , which corresponds to the MLE. 
We improve this claim and thereby obviate computations by showing that the MLE indeed 
always occurs on the specific boundary (level (s) of the variable (s)) for which da. or /3,j is 
negative. In the next three examples, we use Proposition 13.II to illustrate this point for Models 
[M1]-[M5], 


Example 4.1. Consider the data in Table 2 discussed in Baker et al. (1992), which cross- 
classifies mother’s self-reported smoking status ( 17 ) (17 = 1(2) for smoker (non-smoker)) with 
newborn’s weight (17) (17 = 1(2) if weight < 2500 grams (> 2500 grams)). The supplementary 
margins contain data on only smoking status, data on only newborn’s weight and missing data 
on both variables. 


Table 2. Birth weight and smoking: observed counts. 




R 2 — 1 

R 2 = 2 



17 = 1 

*2 = 2 

Y 2 missing 

= 1 

17 = 1 

4512 

21009 

1049 


17 = 2 

3394 

24132 

1135 

Ri = 2 

Yi missing 

142 

464 

1224 


Baker et al. (1992) mentioned that d 2 . < 0 is obtained on fitting models [Ml], [M3] and [M5] 
to the data in Table 2. Also, the value of G 2 corresponding to a 2 . = 0 is larger than that 
corresponding to da. = 0 for all the above models, which is incorrect as shown below. When 
we fit the same models to the data in Table 2 using the ‘MASS’ package in R software, we 
obtain da. = 0.0493 and a 2 = —0.0237 under Models [Ml], [M3] and [M5], that is, boundary 
solutions occur in each of the models. 

Also, G 2 = 55.2198 (12.4682) under Model [Ml], G 2 = 55.2168 (12.4638) under Model [M3] 
and G 2 = 55.214 (12.464) under Model [M5] when di. = 0 (d 2 . = 0). The G 2 values for a 2 = 0 
upon rounding off in each of the models match those given in Table V of Baker et al. (1992). 
Hence, G 2 is minimum for 0 : 2 . = 0 in each case, which implies that boundary solutions are 
given by 0 : 2 . = 0 or equivalently vr 2+ 2 + = 0. This result is consistent with points 1 and 3 of 























Proposition 13.11 Further, it is the exact form of boundary solutions that we obtain on fitting 
Models [Ml], [M3] and [M5] to the data in Table 2 using the EM algorithm (see the ‘ecm.cat’ 
function of ‘cat’ package in R software). 

Example 4.2. Consider the example given in the last paragraph of p. 646 in Baker et al. 
(1992). The model [Ml] was fitted to the following data: ymi = 100, ynn = 40, 1/2111 = 50, 
I /2211 = 1000, 1 / 1+12 = 0, // 2+12 = 0, y + 121 = 100, y+ 22 i = 10 and y++ 22 = 0. They mentioned 
that though cR < 0, G 2 is minimum for cR. = 0 implying that the MLE is on the boundary 
cR. = 0. However, we obtain cR = 1.0153 (> 0) and cR. = —0.0306 on fitting Model [Ml] 
to the above data. Also, note that g = y++liy ++ 22 (see p. 649 of Baker et al. (1992)) is 
undefined since y++n = 0. Hence, we introduce the following changes: yi +12 = 1, 2 / 2+12 = 1 
and y ++22 = 2 as shown in Table 3. 

Table 3. 




i?2 — 1 

R 2 = 2 



y 2 = 1 

y 2 = 2 

Y 2 missing 

R 1 = 1 

*i = l 

100 

40 

1 


*i = 2 

50 

1000 

1 

Ri — 2 

Y] missing 

100 

10 

2 


On fitting models [Ml], [M3] and [M5] to the data in Table 3, we obtain cR = 1.0098 under 
[Ml], and cR. = 1.0153 under [M3] and [M5], along with cR. = —0.0306 under all the above 
models, which implies boundary solutions occur in each case. Also, G 2 = 426.1604 (17.4704) 
under Model [Ml], G 2 = 424.3288 (15.669) under Model [M3] and G 2 = 424.3188 (15.664) 
under Model [M5] when cR = 0 (cR. = 0). Hence, G 2 is minimum for cR. = 0 in each model, 
which implies that boundary solutions are given by ^ 2 + 2 + = 0. This result is consistent with 
points 1 and 3 of Proposition 13.11 Further, it is the exact form of boundary solutions that we 
obtain on fitting Models [Ml], [M3] and [M5] to the data in Table 3 using the EM algorithm. 

Example 4.3. Consider the data in Table 2 discussed in Example 1. We introduce the 
following changes corresponding to supplementary margins in Table 2: 464 —> 700 and 1135 —> 
750. The modified table is shown in Table 4. 

Table 4. Birth weight and smoking: observed counts (modified). 




i?2 — 1 

R 2 = 2 



*2 = 1 

*2 = 2 

Y 2 missing 

Ri = 1 

*i = l 

4512 

21009 

1049 


*1 = 2 

3394 

24132 

750 

Ri = 2 

Y\ missing 

142 

700 

1224 


When we fit the models [M2], [M4] and [M5] to the data in Table 4, we obtain /3+ = 0.2538 
under [M2], and /3.i = 0.2543 under [M4] and [M5] along with /3. 2 = —0.0047 under all 
the above models, that is, boundary solutions occur in each of the models. Also, G 2 = 
98.5962 (3.3548) under Model [M2], G 2 = 96.1622 (0.922) under Model [M4] and G 2 = 
96.162 (0.9276) under Model [M5] when f} A = 0 0 . 2 = 0). The G 2 values in brackets above 
match those obtained using the EM algorithm. Hence, G 2 is minimum for ( 3.2 = 0 in each 
case, which implies that boundary solutions are given by ( 3.2 = 0 or equivalently 71 + 2+2 = 0. 
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This result is consistent with points 2 and 3 of Proposition 13.11 Further, it is the exact form 
of boundary solutions that we obtain on fitting Models [M2], [M4] and [M5] to the data in 
Table 4 using the EM algorithm. 

For an/xJx2x2 incomplete table, Park et al. (2014) mentioned that boundary solutions 
have at least one of the following forms: 

(i) 7r i+2+ = 0 for at least one and at most (/ — 1) values of Y u 

(ii) vr+j +2 = 0 for at least one and at most (J — 1) values of Y 2 . 

Specifically, only the first form ( 7 ^+ 2 + = 0) may occur for Models [Ml] and [M3], while only 
the second form (d + j +2 = 0) may occur for Models [M2] and [M4]. The boundary solutions 
are given by 7fj + 2+ = 0 or n+j +2 = 0 for Model [M5]. This is consistent with the forms of 
boundary solutions under Models [M1]-[M5] in Proposition 13.11 


5. Conditions for the occurrence of boundary solutions 

In this section, we discuss sufficient conditions and also propose necessary conditions for the 
occurrence of boundary solutions in two-way incomplete tables with both variables missing. 
We show that the sufficient conditions are not necessary, which disproves a conjecture made 
by Kim and Park (2014). Further, we prove that the proposed necessary conditions are not 
sufficient. Both sets of conditions are simple to verify since they involve only the observed 
cell counts in the tables. The sufficient and the necessary conditions are of practical utility 
in identifying the occurrence and non-occurrence, respectively of boundary solutions in such 
tables. 


5.1. Sufficient conditions for the occurrence of boundary solutions. Following Park 
et al. (2014), define the four odds based on the observed (joint/marginal) cell counts for any 
pair of Y 2 : 


(5-1) Ui(j,j') = ^1L, u n (j,f) = u m (j,j') = max{i^(j,/)}, 

Tjyii * * y+j' 21 

Similarly, for a given pair of Y l , define the four odds using the observed cell counts: 


(5.2) = ~~~~ 1 u n (i,i r ) = min{a;j(i, i')}, = max{wj(i, ?')}, = ,j/ ' +1 . 

Ki'j 11 3 3 Vi>+ 12 

Note that /y;(j, j r ) and ojj(i, i') are called the response odds, while u(j,j r ) and u>(i, i') are called 
the nonresponse odds. Using the MLE’s of { 71 ^ 11 } under Models [M1]-[M5] (see pp. 647-648 
of Baker et al. (1992)), we deduce that and i') = ihiil, which involve only 

yij r 11 jn 

the fully observed counts. 

The following theorem provides sufficient conditions for the occurrence of boundary solu¬ 
tions in Models [M1]-[M5]. We provide a proof which is similar to that of Theorem 1 of Park 
et al. (2014), but we give direct arguments instead of using contrapositive ones as in Park et 
al. (2014). 


Theorem 5.1. Consider the following conditions for an / x / x 2 x 2 contingency table. 

1- £ {yJj,3 r ),vdo,i r )) for at least one pair (j,f) of Y 2 , 

2 . (u) n (i, i 1 ), i')) for at least one pair of Y^. 

Then we have the following: 
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(a) Boundary solutions in NMAR models for only Y\ (Models [Ml] and [M3]) occur if 
Condition 1 holds. 

(b) Boundary solutions in NMAR models for only Y 2 (Models [M2] and [M4]) occur if 
Condition 2 holds. 

(c) Boundary solutions in the NMAR model for both Y\ and V 2 (Model [M5]) occur if 
Condition 1 or Condition 2 holds. 


Proof. From Baker et al. (1992), the MLE’s cR under the NMAR model for only Y\ (Models 
[Ml] and [M3]) satisfy 

(5.3) ^ NTtijn&i. = y+j 2 i, V 1 < j < I, 

i 

while the MLE’s ffj under the NMAR model for only Y 2 (Models [M2] and [M4]) satisfy 

(5.4) Y, NnjjnP.j = y i+12 , V 1 < i < I. 

j 

The MLE’s cR and [3.j under the NMAR model for both Y x and Y 2 (Model [M5]) satisfy both 
(15.31) and (15.4[) . Note that boundary solutions in Models [Ml] and [M3] occur if dj. < 0 for at 

least one and at most (/ — 1) values of Y x , while boundary solutions in Models [M2] and [M4] 

occur if fi.j < 0 for at least one and at most (/ — 1) values of Y 2 . Also note that boundary 
solutions under [M5] occur if at least one of the following holds: 

(i) cR < 0 for at least one and at most (/ — 1) values of YR 

(ii) Aj < 0 for at least one and at most (/ — 1) values of Y 2 . 

From (15.11) and (15.3[) . we have 

y+j '21 Kij'llOLi. 


(5.5) 


1 - 

(5.6) 




^mi j 1 11 

(^~m i'll^iill ^n\jY\ftiy\\)®'i. 

Si ffij’ll&i. 


where m x and n x are the levels of Y l corresponding to v m (j, j 1 ) and v n (j, j r ) respectively. From 
(EH), we get 


(5.7) 


Vn{j,j') = 


T^nijll 
^nij'll 


< Vi(jJ') = 


Tfijll 

ij'll 


< Vm{j,j') 


From (15.71) . we have the following inequalities 


TTmijll 
7T mij'll 


(5.8) Tfmijll^ij'll 5“ ^nij'll^ijll P’ 11 ^ ij’ 11 for ^ 7^ 

Consider part (a). Suppose Condition 1 holds, which implies that (15.5p and (15.61) are of 
opposite signs. Using this fact and (j5.8j) . we observe that cR < 0 for at least one and at most 
(J — 1) values of Y x , that is, boundary solutions of the form 7fj +2 + = 0 occur. 

ll 






















Again from (j5.2j) and (15.4[) . we have 


a }(i, i') 


Vi +12 
Vi'+12 




(5.9) 


rn. ( ^ i ^ 


j^m,2 ( ^"® m 211 ^i'j 11 i'm 2 11 ^ij 11) ft. j 


(5.10) 




(^"i'ri2ll^"ijll ^in211 Tp'j 11) A j 

Tp'r^ll /A ; T^i'jllP.j 


where m 2 and n 2 are the levels of Y 2 corresponding to uj m (i,i') and uj n (i,i') respectively. From 
(1A2|) . we get 


(5.11) 


ou n (i,i = 


^"z'n2ll 


< l ) = 


K l j 11 


7Ti f 




z'jll 


^im2 11 
^"z / m2ll 


From (15. lip , we have the following inequalities 


(5.12) 

^rri2j 11 ^ij'l 1 ^"m2j'll^"ijll) ^U2j for J 7“ ^2,^2- 

Now consider part (b). Assume Condition 2 holds, which implies that (15.9[) and (15.101) are 
of opposite signs. Using this fact and (j5.12|) . we observe that /3,j < 0 for at least one and at 
most (/ — 1) values of U 2 , that is, boundary solutions of the form d + j +2 = 0 occur. 

Finally consider part (c). Assume at least one of Conditions 1 and 2 holds. The cases when 
only Condition 1 holds or only Condition 2 holds follow from the proofs of part (a) and part 
(b) respectively. So it is sufficient here to assume both Conditions 1 and 2 hold. This implies, 
from part (a), ay < 0 for at least one and at most (/ — 1) values of Y \, that is, boundary 
solutions of the form 7Tj +2+ = 0 occur. Also from part (b), we have f3,j < 0 for at least one 
and at most (/ — 1) values of Y 2 , that is, boundary solutions of the form n + j +2 = 0 occur. 
This completes the proof. □ 


In the following example, we use Theorem 15.11 to establish the occurrence of boundary 
solutions. The verification thereafter follows directly from the definition of boundary solutions 
in Baker et al. (1992) and Proposition 13.11 

Example 5.1. Consider Table 5 discussed in Park et al. (2014), which cross-classifies data 
on bone mineral density {Y\) and family income (U 2 ) ina3x3x2x2 incomplete table. Both 
variables Y\ and Y 2 have three levels. The total count is 2998 out of which data on Y\ and 
Y 2 are available for 1844 persons, data on Y\ only for 231 persons, data on Y 2 only for 878 
persons, and data on neither of them for 45 persons. 

Table 5. Bone mineral density (Yi) and family income (Y 2 ). 
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R-2 — 1 
4 = 1 

4 = 2 

4 = 3 

T?2 — 2 

Missing 


4 = 1 

621 

290 

284 

135 

Ri — 1 

Li = 2 

260 

131 

117 

69 


4 = 3 

93 

30 

18 

27 

R\ — 2 

Missing 

456 

156 

266 

45 


Tables 6 and 7 are from Park et al. (2014) in which odds for the various NMAR models in 
Table 5 are given. 

Table 6. Odds for the Models [Ml], [M3] and [M5] in Table 5. 


(j,f) 

VltiJ') 


pj (.A)') 


(1,2) 

(1.3) 

(2.3) 

2.14(= 621/290) 
2.19(= 621/284) 
1.02(= 290/284) 

1.98(= 260/131) 
2.22(= 260/117) 

1.12(= 131/117) 

3.10(= 93/30) 
5.17(= 93/18) 
1.67(= 30/18) 

2.92(= 456/156) 
1.71(= 456/266) 
0.59(= 156/266) 


Table 7. Odds for the Models [M2], [M4] and [M5] in Table 5. 


(MO 

wi(M') 

w 2 (M') 


c j(i, i') 

(1,2) 

(1.3) 

(2.3) 

2.39(= 621/290) 
6.68(= 621/93) 
2.80(= 260/93) 

2.21(= 290/131) 
9.67(= 290/30) 
4.37(= 131/30) 

2.43(= 284/117) 
15.78(= 284/18) 
6.50(= 117/18) 

1.96(= 135/69) 
5.00(= 135/27) 
2.56(= 69/27) 


Let = {v n (j,j'),Vm(j,f)) and A(M') = Then from Tables 6 

and 7, we observe that 4(1,2) = (1.98,3.10), 4(1,3) = (2.19,5.17), 4(2,3) = (1.02,1.67), 
4(1,2) = (2.21,2.43), 4(1,3) = (6.68,15.78) and 4(2,3) = (2.80,6.50). Also, z/(l,2) e 
4(1, 2), u(l, 3) ft 4(1, 3), i/(2, 3) £ 4(2, 3), w( 1, 2) £ 4(1, 2), u(l, 3) £ 4(1, 3) and u(2, 3) £ 
4(2, 3) so that the sufficient conditions for the occurrence of boundary solutions in Theorem 
15.11 are satisfied. 

Hence, boundary solutions will occur when Models [M1]-[M5] are fitted to the data in Table 
5. To verify this observation, we fit the above models to data in various subtables of Table 5. 
It is assumed that in a particular subtable, data on only the corresponding variable is missing, 
while that on other variables are observed. The MLE’s of the parameters, computed using 
the ‘MASS’ package in R software, are shown in Table 8. 

Table 8. MLE’s of parameters in subtables of Table 5. 


Subtable 

NMAR 

model 

MLE’s 

Boundary 

solutions 

4 

[Ml] 

a L = 4.5205,4. = -8.2411,4. = -1.6019 

4+2+ = 4+2+ = o 

4 

[M2] 

Ai = 0.1008, A.2 = 1.2338, As = -0.8060 

43+2 = 0 

44 

[Ml] 

4. = 4.5205,4. = -8.2411,4. = -1.6019 

4+2+ = 4+2+ = 0 


[M3] 

4. = 4.4716,4. = —8.3197,4. = —1.6962 

4+2+ = 4+2+ = 0 

44 

[M2] 

Al = 0.1008, A 2 = 1.2338, As = -0.8060 

43+2 = 0 


[M4] 

Ai = 0.1002, A 2 = 1.1248, As = -0.8922 

43+2 = 0 

44 

[M5] 

4. =4.4716,4. = —8.3197, 4. = —1-6962, 
Al = 0.1002, Aa = 1.1248, As = -0.8922 

4+2+ = 4+2+ = o, 
43+2 = 0 


13 





























From the above table, we observe in each subtable, at least one of 4. and 4 is negative, 
which imply that boundary solutions occur. The forms of boundary solutions under the 
Models [M1]-[M5] are also the same as described in Section 3. Note that this check follows 
directly from definitions in Baker et al. (1992) and Proposition 13.11 There is no need to use 
the EM algorithm. 

5.2. The sufficient conditions are not necessary. The next example shows that the 
sufficient conditions for the occurrence of boundary solutions mentioned in Theorem 15.11 are 
not necessary. This result has not been discussed in the literature earlier. In fact, Kim 
and Park (2014) proved that the above conditions are both necessary and sufficient for a 
2 x 2 x 2 x 2 incomplete table. They conjectured that a similar result would hold for general 
two-way incomplete tables as well. 

Example 5.2. Consider Table 5 discussed in the previous example. We introduce the follow¬ 
ing changes corresponding to supplementary margins in Table 5: 266 —y 125, 69 —> 60 and 
27 —y 20. The modified table is shown in Table 9. 

Table 9. Table 5a . 




A 2 — 1 
*2 = 1 

y 2 = 2 

4 = 3 

A 2 — 2 
Missing 


*i = l 

621 

290 

284 

135 

Ai = 1 

*i = 2 

260 

131 

117 

60 


4 = 3 

93 

30 

18 

20 

Ai = 2 

Missing 

456 

156 

125 

45 


From Table 9, z/(l,2) = 456/156 = 2.92, z/(l,3) = 456/125 = 3.65, z/(2, 3) = 156/125 = 1.25, 
u{ 1,2) = 135/60 = 2.25, w(l,3) = 135/20 = 6.75 and u(2,3) = 60/20 = 3.00. Also, 
z/(l,2) G 4(1,2), i/(l, 3) e 4(1,3), i/(2, 3) G 4(2,3), u( 1,2) e 4(1,2), u(l,3) E 4(1,3) and 
(u(2,3) E 4(2,3) so that the sufficient conditions for the occurrence of boundary solutions 
in Theorem 15.11 are not satisfied. The MLE’s of the parameters obtained on fitting Models 
[M1]-[M5] in various subtables of Table 9 are shown in Table 10. 

Table 10. MLE’s of parameters in subtables of Table 9. 


Subtable 

NMAR 

model 

MLE’s 

Boundary 

solutions 

4 

Ml] 

4 = 0.6556,4. = -1.0537,4. = 3.4109 

4-2+ = 0 

4 

[M2] 

4 = 0.1355,4 = 0.3420,4 = -0.1846 

4-3+2 = 0 

44 

[Ml] 

4 . = 0.6556,4. = -1.0537,4. = 3.4109 

4 + 2 + = 0 


[M3] 

4. = 0.6534,4. = -1.0551,4. = 3.4874 

4 + 2 + = 0 

44 

[M2] 

4 = 0.1355,4 = 0.3420,4 = -0.1846 

4-3+2 = 0 


[M4] 

4 = 0.1421,4 = 0.3289,4 = -0.1712 

43+2 = 0 

44 

[M5] 

4. = 0.6534,4. = -1.0551,4. = 3.4874, 
4 = 0.1421,4 = 0.3289,4 = -0.1712 

4+2+ = 0, 
43+2 = 0 


From the above table, note that in each subtable, at least one of <4 and 4 is negative, which 
imply that boundary solutions occur. The forms of boundary solutions under the Models 
[M1]-[M5] are also the same as described in Section 3. This shows that for an / x J x 2 x 2 
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incomplete table, where I, J > 3, the sufficient conditions for the occurrence of boundary 
solutions under Models [M1]-[M5] in Theorem 15.11 are not necessary. 

5.3. Necessary conditions for the occurrence of boundary solutions. We next state 
below a result due to Kaykobad (1985), which will be used later to obtain a result on the 
occurrence of boundary solutions. 

Lemma 5.1. Suppose A = (a^-) is a matrix with a n j > 0 for i ^ j — 1, 2,..., n and an > 0. 
Also, let b = ( bj ), where bj > 0 for 1 < j < n. If 

n b- 

(5.13) bi > a ij —, V 1 < i < n, 

j &=i ajj 

then A is invertible and A^b > 0. 

Using Lemma [5711 the next result provides necessary conditions for the occurrence of bound¬ 
ary solutions under Models [M1]-[M5] in square two-way incomplete tables. 

Theorem 5.2. For an/x/x2x2 incomplete table, consider the following conditions: 

1- V+j 2 i < AhiifyW for at least one 3 = 2 > • • • > 

2. y i+ 12 < Y?&i =i for at least one i = 1,2 ,...,/, 

where fi l3 \\ is the MLE of Hijn. Also, let {/ijjii} > 0, {^+ 12 } > 0 and {y + j 2 i} > 0. Then we 
have the following: 

(a) If boundary solutions under Models [Ml] and [M3] occur, then only Condition 1 holds. 

(b) If boundary solutions under Models [M2] and [M4] occur, then only Condition 2 holds. 

(c) If boundary solutions under the Model [M5] occur, then Condition 1 or Condition 2 
holds. 

Proof. From Theorem 15.11 the MLE’s ciy and /3,j under Model [M5] satisfy 

(5.14) = y+j 2 i for j = 1,..., /, 

i 

(5.15) Y, fiijiiP'j = y i+ 12 for i = 1,..., I. 

j 

Also, the MLE oy under Models [Ml] and [M3] satisfy (15.141) only, while the MLE /3.j under 
Models [M2] and [M4] satisfy (15.151) only. Note that boundary solutions under [M5] occur if 
at least one of the following conditions hold: 

(i) by < 0 for at least one and at most (/ — 1) values of Yi, 

(ii) P.j < 0 for at least one and at most (/ — 1) values of Y 2 . 

Also, boundary solutions in Models [Ml] and [M3] are given by only Condition (i), while 
boundary solutions in Models [M2] and [M4] are given by only Condition (ii). In Lemma [5.11 
take A = (/%n), b = (bj) = (y +j2 1 ) and b* = (b*) = (y i+12 ) for 1 < i < I, 1 < j < I. Then 
(I5.14p may be written as A T a = b, while (15.151) may be written as A/3 = b*, where a = (ay) 
and (3 = ( (3,j ). We prove Theorem 15.21 bv contrapositive. 
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Consider part (a) first. Suppose Condition 1 in Theorem 15.21 does not hold. Then by Lemma 
EH a = ( A T ) x b > 0. In other words, ay > 0 for all 1 < i < /, that is, boundary solutions 
under Models [Ml] and [M3] do not occur. 

Consider part (b) now. Assume Condition 2 in Theorem 15. 2 1 does not hold. Then by Lemma 
15.11 /3 = A~ x b* > 0. In other words, (3,j > 0 for all 1 < j < /, that is, boundary solutions 
under Models [M2] and [M4] do not occur. 

Finally consider part (c). Assume both Conditions 1 and 2 in Theorem 15.21 do not hold. 
Then by Lemma 15.11 both dy > 0 and (3^ > 0 for all 1 < i < /, 1 < j < I, that is, boundary 
solutions under Model [M5] do not occur. 

Hence, the result follows. □ 

Henceforth, we denote A = (ay) = (jly n), b = ( bj ) = (y+j2i) and b* = ( b*) = (y i+ 1 2 ) for 
1 < i < I, 1 < j < /. The example below is an application of Theorem 15.21 


Example 5.3. From Table 9 in Example 15.21 we have the following: 

/621 290 284\ 

A = 260 131 117 , b = (456,156,125), b* = (135, 60, 20). 

\ 93 30 18 J 

The MLE’s d = (ay) and (3 = (/ 3,j ) under Model [M5] satisfy respectively the systems A T a = b 

from f!5.14p and Af3 = b* from (15.15f) for i,j = 1, 2, 3. From Table 10, we observe that if Model 

[M5] is fitted to the data in Table 9, then we obtain d? 2 . < 0 and /?.3 < 0, that is, boundary 

solutions occur. Now we need to verify if both Conditions 1 and 2 of Theorem 15.21 hold. For 

the matrix A T and the vector b, we have 

b b‘ 156 125 

456 < ai 2 x + a i3 x — = 260 x —— + 93 x —— = 955.4516, 

&22 ^33 131 18 

b b 456 125 

156 < a 2 i x — + a 23 x — = 290 x -+ 30 x -= 421.2802, 

(in ^33 621 18 

b b 456 156 

125 < a 3 i x — + a 32 x — = 284 x-+ 117 x-= 347.8693, 

On ®22 621 131 

so that Condition 1 in Theorem 15.21 is satisfied. Also, for the matrix A and the vector b*, we 
have 






b* 2 




b* 

60 


20 

135 

< 

ai2 

X 


+ 

Oi 3 

X 


= 290 x- 

+ 284 

x — = 




022 



o 33 

131 


18 

60 




b\ 




b*3 

135 

+ 117 

20 

< 

«21 

X 


+ 

023 

X 


= 260 x- 

x — = 




Oil 



o 33 

621 


18 

20 




b\ 




b*2 

135 


60 

< 

o 3 i 

X 


+ 

0 3 2 

X 


= 93 x-b 30 x 





Oil 



0 2 2 

621 


131 


= 33.9578, 


so that Condition 2 in Theorem 15.21 is satisfied. Further, from Table 10, we observe that 
boundary solutions also occur if Models [M1]-[M4] are fitted to data in Table 9. Then only 
Condition 1 is satisfied if boundary solutions under [Ml] and [M3] occur, while only Condition 
2 is satisfied if boundary solutions under [M2] and [M4] occur. This is because the MLE 
a = (ay) under Models [Ml] and [M3] satisfies the system A T a = b, while the MLE f3 = (/3.j) 
under Models [M2] and [M4] satisfies the system Af3 = b*. 
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5.4. The necessary conditions are not sufficient. The next example shows that the 
necessary conditions for the occurrence of boundary solutions in Theorem 15.21 are not sufficient. 

Example 5.4. In Example 15.31 replace 456 by 366 in b and 20 by 15 in b* so that b = 
(366,156,125) and b* = (135, 60,15) now. For the matrix A T and the vector b, we have 


366 




b 2 




h 

= 260 


156 


93 x 

125 

= 955.4516, 

< 

«12 

X 


+ 

a 13 

X 


X 


+ 

- = 




O 22 



033 



131 



18 


156 




bi 




h 

= 290 


366 


30 x 

125 

= 379.2512, 

< 

^21 

X 


+ 

023 

X 


X 


+ 

- = 




an 



033 



621 



18 






bi 




b 2 

= 284 


366 



156 

= 306.7099 

125 

< 

031 

X 


+ 

032 

X 


X 


+ 

117 x 




Oil 



O 22 



621 



131 



so that Condition 1 in Theorem 15.21 is satisfied. Also, for the matrix A and the vector b*, we 
have 

b* b* 60 15 

135 < a 12 x -2- + ai 3 x — = 290 x — + 284 x — = 369.4911, 

&22 a 33 131 18 

6* 6* 135 15 

60 < a 2 i x —+ a 23 x —= 260 x - + 117 x — = 154.0217, 

dii ^33 621 18 

6* b* 135 60 

15 < a 3 i x — + a 32 x —= 93 x -+ 30 x -= 33.9578, 

On o 22 621 131 

so that Condition 2 in Theorem 15.21 is satisfied. Now, when we solve the system A T a = b, then 
we obtain the MLE’s cR = 0.0133, d 2 . = 0.7796 and d 3 = 1.6671. So, there are no boundary 
solutions under Model [M3]. Similarly, the system A(3 = b* yields the MLE’s /3.i = 0.041, 
Aa = 0.3655 and /3 3 = 0.0126, that is, there are no boundary solutions under Model [M4]. 
Since the MLE’s in Model [M5] satisfy both the systems A T a = b and Af3 = b*, there are no 
boundary solutions under [M5] as well. Similar results hold for Models [Ml] and [M2], Hence, 
the conditions in Theorem 15.21 are not sufficient for the occurrence of boundary solutions under 
Models [M1]-[M5], 

5.5. Importance of the necessary conditions. Here, we discuss additional details about 
Theorem 15.21 and discuss its simplicity and effectiveness. 

From Theorem 15.21 note that if {y i+ 1 2 }, {y+j 2 1 }, and/or {jinn} are large, then Conditions 
1 and 2 may not hold. Indeed, if the inequalities in Conditions 1 and 2 are reversed for all 
1 < i < I and 1 < j < /, then from statements (a), (b) and (c) of Theorem 15.21 boundary 
solutions do not occur on fitting Models [M1]-[M5] in an / x / x 2 x 2 incomplete table. 

It is known that when boundary solutions occur, perfect fit models (here Models M3], 
[M4] and [M5]) cannot reproduce the observed counts, indicating poor fit and imprecision 
of the parameter estimates. The MLE’s of the parameters under NMAR models lie on the 
boundary of the parameter space and the log likelihood function tends to be flat, which makes 
derivation of the MLE’s computationally intensive. Also, the corresponding covariance matrix 
has unreasonable eigenvalues (close to either zero or negative), which implies the estimated 
standard errors for some parameter estimates are large. Hence, for model selection, we prefer 
NMAR models which don’t yield boundary solutions upon fitting them to the given data. 

Theorem 15.11 provides conditions, which help us identify the occurrence of boundary so¬ 
lutions. However, boundary solutions may occur under some NMAR models if any of the 
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sufficient conditions in Theorem 15.11 does not hold. This implies that Theorem 15.11 cannot 
always provide us the set of plausible NMAR models for model selection. However, note 
that Theorem 15.21 is very useful in this regard since it gives us an insight into verifying the 
non-occurrence of boundary solutions under each of the NMAR models [Ml]-[M5]. That is, 
if any of the necessary conditions in Theorem 15.21 does not hold, then we know for sure that 
boundary solutions do not occur. This always helps us to obtain the list of candidate NMAR 
models suitable for fitting the given data. Hence, Theorem 15.21 is more reliable than Theorem 
O for the purpose of model selection in square two-way incomplete tables. 

The non-boundary MLE’s of /i^n are /t^n = under Model [Ml], /t^n = 

yijuy+,+iy++u un( ] er Model [M2], and fi tJ \i = y XJ \\ under Models [M3], [M4] and [M5] (see 
pp. 647-648 of Baker et al. (1992)), which involve only the observed cell counts and their 
sums. Hence, from Theorem 15.21 there is no need to solve any system of likelihood equations, 
use the EM algorithm or compute odds (based on the observed (joint/marginal) cell counts) 
to check for the non-occurrence of boundary solutions in an I x / x 2 x 2 incomplete table. 

Remark 5.1. If A D = diag(a n ,..., an), then from Kaykobad (1985), the solutions a = (a*.) 
of the system A T a = b may be obtained iteratively as follows. 

q/(°) = A~^h 

(5.16) a (n+1) = a (n) + Ap\b- Aa^ n) ), n = 0,1,2 ,.... 

Similarly, the solutions ft = (P.j) of the system Af3 = b* may be obtained iteratively as follows. 

/3 (0) = A~ D l b* 

(5.17) /3 (n+1) = fr n) + A~ D l (h* - Ap^), n = 0,1,2 ,.... 

Both the sequences (j5.16(1 and (I5.17P converge to the solutions of the respective systems. 

6. Conclusions 

In this paper, we have discussed the problem of boundary solutions that occur under various 
NMAR models for an / x J x 2 x 2 table. We formally define boundary solutions for such 
a table and provide a result that connects various forms of these solutions under alternative 
parametrizations of the missing data models. This eliminates the need to use the EM algorithm 
for verifying their occurrence. The above result is then used to improve a claim in Baker et 
al. (1992) regarding the occurrence of boundary solutions. We give the precise form of such 
solutions by just noting the corresponding level (s) of the variable (s) in the table, which 
reduces computational burden. 

As discussed earlier, boundary solutions pose a lot of problems for estimation and inference 
under NMAR models in incomplete tables. Hence, it is important to investigate sufficient and 
necessary conditions for their occurrence in such tables. We have provided a result on the 
sufficient conditions for the occurrence of boundary solutions in an / x J x 2 x 2 table. We use 
a similar approach but give direct arguments instead of contrapositive ones used by Park et 
al. (2014) for proving it. Kim and Park (2014) conjectured that these conditions would also 
be necessary for general two-way incomplete tables. However, we show by a counterexample 
that this is not the case for I, J > 3, thereby disproving the conjecture. 

We have also established necessary conditions for the occurrence of boundary solutions in 
an / x J x 2 x 2 table, which have not been discussed in the literature so far. As discussed in 
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Section 5.5, these conditions are of practical utility to identify the non-occurrence of boundary 
solutions and hence for model selection. However, we show by a counterexample that these 
conditions are not sufficient. Note that a major advantage of the proposed sufficient conditions 
and necessary conditions is that they depend only on the observed cell counts in the table 
or their sums. As mentioned in Park et al. (2014), this makes the verification process much 
easier, and avoids using the EM algorithm or solving likelihood equations. Finally, all the 
above results are illustrated using numerous data analysis examples. It would be helpful to 
obtain a set of conditions involving only the observed cell counts, which are sufficient as well 
as necessary for the occurrence of boundary solutions in two-way incomplete tables with both 
variables missing. 
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